library(tidyverse)
flights <- nycflights13::flights
# Flights with arrival delay of 2 or more hours
delayed_flights <- flights |> filter(arr_delay >= 120) # The unit of the `arr_delay` variable is in a minute.
# Flights to Houston (IAH or HOU)
houston_flights <- flights |> filter( dest == "IAH" | dest == "HOU" )
# Flights that departed in summer (July, August, September)
summer_flights <- flights |> filter( month == 7 | month == 8 | month == 9 )
# Flights that arrived more than two hours late but didn’t leave late
late_arrival_on_time_departure <- flights |> filter( arr_delay > 120 & dep_delay <= 0 )
# Flights that departed between midnight and 6am (inclusive)
early_morning_flights <- flights |> filter( (dep_time >= 0 & dep_time <= 600) | dep_time == 2400 )filter(), arrange(), and distinct()
Classwork 4
Question 1
- Find all flights that had an arrival delay of two or more hours
- Find all flights that flew to Houston (
IAHorHOU) - Find all flights that departed in summer (July, August, and September)
- Find all flights that arrived more than two hours late, but didn’t leave late
- Find all flights that departed between midnight and 6am (inclusive) [Challenging]
Answer:
Answer:
filter(arr_delay >= 120): Finds flights with an arrival delay of two or more hours.filter( dest == "IAH" | dest == "HOU" ): Filters flights flying to Houston by checking if thedestvariable matches “IAH” or “HOU”.filter( month == 7 | month == 8 | month == 9 ): Filters flights based on themonthvariable for July, August, and September.filter(arr_delay > 120 & dep_delay <= 0): Filters flights that arrived more than two hours late but left on time or early.filter((dep_time >= 0 & dep_time <= 600) | dep_time == 2400): Filters flights departing between midnight and 6am (using military time). The conditiondep_time == 2400is included because, in this data.frame, midnight is represented as2400rather than0.
Question 2
- How many flights have a missing
dep_time?
Answer:
missing_dep_time_flights <- flights |> filter(is.na(dep_time))
n_missing_dep_time <- nrow(missing_dep_time_flights)
n_missing_dep_time[1] 8255
Answer: We use filter(is.na(dep_time)) to find flights where the dep_time is missing, and nrow() to count the number of such flights.
Question 3
- Sort flights to find the most delayed flights.
Answer:
# either dep_delay, arr_delay, or both can be used for this task
most_delayed_flights <- flights |> arrange(desc(dep_delay)) Answer: arrange(desc(dep_delay)) sorts flights in descending order of departure delay (dep_delay), placing the flights with the longest departure delays at the top.